Thesis

Datasets

UCI and Kaggle datasets

Keel datasets

Numerical

Binary

[1] "banknote"
[1] "haberman"
[1] "skin"
[1] "vertebral_column2"
[1] "weight_height"
[1] "audit_risk"
[1] "ionospheren"
[1] "sonar"
# A tibble: 8 x 9
  name  type  instances features num_cat classes class_names proportion
  <chr> <fct>     <int>    <dbl> <chr>     <int> <chr>       <chr>     
1 bank… nume…      1372        4 [4/0]         2 [1/0]       [0.44/0.5…
2 habe… nume…       306        3 [3/0]         2 [2/1]       [0.26/0.7…
3 skin  nume…    245057        3 [3/0]         2 [1/2]       [0.21/0.7…
4 vert… nume…       310        6 [6/0]         2 [Normal/Ab… [0.32/0.6…
5 weig… nume…     10000        2 [2/0]         2 [Male/Fema… [0.5/0.5] 
6 audi… nume…       776       24 [24/0]        2 [1/0]       [0.39/0.6…
7 iono… nume…       351       32 [32/0]        2 [b/g]       [0.36/0.6…
8 sonar nume…       208       60 [60/0]        2 [R/M]       [0.47/0.5…
# … with 1 more variable: imbalance_ratio <dbl>

< 10

[1] "banknote"
[1] "haberman"
[1] "skin"
[1] "vertebral_column2"
[1] "weight_height"
# A tibble: 5 x 9
  name  type  instances features num_cat classes class_names proportion
  <chr> <fct>     <int>    <dbl> <chr>     <int> <chr>       <chr>     
1 bank… nume…      1372        4 [4/0]         2 [1/0]       [0.44/0.5…
2 habe… nume…       306        3 [3/0]         2 [2/1]       [0.26/0.7…
3 skin  nume…    245057        3 [3/0]         2 [1/2]       [0.21/0.7…
4 vert… nume…       310        6 [6/0]         2 [Normal/Ab… [0.32/0.6…
5 weig… nume…     10000        2 [2/0]         2 [Male/Fema… [0.5/0.5] 
# … with 1 more variable: imbalance_ratio <dbl>

Banknote authentication

Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.

Description of the attributes:

  • variance: variance of wavelet transformed image numerical
  • skewness: skewness of wavelet transformed image numerical
  • curtosis: curtosis of wavelet transformed image numerical
  • entropy: entropy of the image numerical
  • class:

Data
# A tibble: 1,372 x 5
   class variance skewness curtosis entropy
   <fct>    <dbl>    <dbl>    <dbl>   <dbl>
 1 0        3.62      8.67   -2.81   -0.447
 2 0        4.55      8.17   -2.46   -1.46 
 3 0        3.87     -2.64    1.92    0.106
 4 0        3.46      9.52   -4.01   -3.59 
 5 0        0.329    -4.46    4.57   -0.989
 6 0        4.37      9.67   -3.96   -3.16 
 7 0        3.59      3.01    0.729   0.564
 8 0        2.09     -6.81    8.46   -0.602
 9 0        3.20      5.76   -0.753  -0.613
10 0        1.54      9.18   -2.27   -0.735
# … with 1,362 more rows

Haberman

The dataset contains cases from a study that was conducted between 1958 and 1970 at the University of Chicago’s Billings Hospital on the survival of patients who had undergone surgery for breast cancer.

Description of the attributes:

  • age: Age of patient at time of operation numerical
  • year: Patient’s year of operation numerical
  • nodes: Number of positive axillary nodes detected numerical
  • class: Survival status (class attribute)
    • 1 = the patient survived 5 years or longer [positive]
    • 2 = the patient died within 5 year

Skin segmentation

The skin dataset is collected by randomly sampling B,G,R values from face images of various age groups (young, middle, and old), race groups (white, black, and asian), and genders obtained from FERET database and PAL database. Total learning sample size is 245057; out of which 50859 is the skin samples and 194198 is non-skin samples. Color FERET Image Database: [Web Link], PAL Face Database from Productive Aging Laboratory, The University of Texas at Dallas: [Web Link].

Description of the attributes:

Vertebral column 2

Biomedical data set built by Dr. Henrique da Mota during a medical residence period in the Group of Applied Research in Orthopaedics (GARO) of the Centre Médico-Chirurgical de Réadaptation des Massues, Lyon, France. The task consists in classifying patients as belonging to one out of two categories: Normal (100 patients) or Abnormal (210 patients). We provide files also for use within the WEKA environment.

Classifying patients as belonging to one out of three categories: Normal (100 patients), Disk Hernia (60 patients) or Spondylolisthesis (150 patients).

Description of the attributes:

Weight, height, gender

weights and heights of males and females

  • Source: Kaggle
  • Classification: binary
  • Input features: numerical
  • Number of rows: 10000
  • Number of attributes: 2

Description of the attributes:

>= 10

[1] "audit_risk"
[1] "ionospheren"
[1] "sonar"
# A tibble: 3 x 9
  name  type  instances features num_cat classes class_names proportion
  <chr> <fct>     <int>    <dbl> <chr>     <int> <chr>       <chr>     
1 audi… nume…       776       24 [24/0]        2 [1/0]       [0.39/0.6…
2 iono… nume…       351       32 [32/0]        2 [b/g]       [0.36/0.6…
3 sonar nume…       208       60 [60/0]        2 [R/M]       [0.47/0.5…
# … with 1 more variable: imbalance_ratio <dbl>

Audit risk

Many risk factors are examined from various areas like past records of audit office, audit-paras, environmental conditions reports, firm reputation summary, on-going issues report, profit-value records, loss-value records, follow-up reports etc. After in-depth interview with the auditors, important risk factors are evaluated and their probability of existence is calculated from the present and past records.

The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors. The information about the sectors and the counts of firms are listed respectively as Irrigation (114), Public Health (77), Buildings and Roads (82), Forest (70), Corporate (47), Animal Husbandry (95), Communication (1), Electrical (4), Land (5), Science and Technology (3), Tourism (1), Fisheries (41), Industries (37), Agriculture (200).

Description of the attributes:

Eeg eye state

All data is from one numerical EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement and added later manually to the file after analysing the video frames. ‘1’ indicates the eye-closed and ‘0’ the eye-open state. All values are in chronological order with the first measured value at the top of the data.

Description of the attributes:

Ionospheren

This radar data was collected by a system in Goose Bay, Labrador. This system consists of a phased array of 16 high-frequency antennas with a total transmitted power on the order of 6.4 kilowatts. See the paper for more details. The targets were free electrons in the ionosphere. “Good” radar returns are those showing evidence of some type of structure in the ionosphere. “Bad” returns are those that do not; their signals pass through the ionosphere.

Received signals were processed using an autocorrelation function whose arguments are the time of a pulse and the pulse number. There were 17 pulse numbers for the Goose Bay system. Instances in this databse are described by 2 attributes per pulse number, corresponding to the complex values returned by the function resulting from the complex electromagnetic signal.

Description of the attributes:

  • X1-X34: numerical
  • class:
    • Bad: [positive]
    • Good:

Sonar

contains 111 patterns obtained by bouncing sonar signals off a metal cylinder at various angles and under various conditions. The file “sonar.rocks” contains 97 patterns obtained from rocks under similar conditions. The transmitted sonar signal is a frequency-modulated chirp, rising in frequency. The data set contains signals obtained from a variety of different aspect angles, spanning 90 degrees for the cylinder and 180 degrees for the rock.

Each pattern is a set of 60 numbers in the range 0.0 to 1.0. Each number represents the energy within a particular frequency band, integrated over a certain period of time. The integration aperture for higher frequencies occur later in time, since these frequencies are transmitted later during the chirp.

The label associated with each record contains the letter “R” if the object is a rock and “M” if it is a mine (metal cylinder). The numbers in the labels are in increasing order of aspect angle, but they do not encode the angle directly.

Description of the attributes:

Multiclass

< 10

Ecoli

Desription of the dadtaset

Description of the attributes:

Iris

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Predicted attribute: class of iris plant.

This is an exceedingly simple domain.

This data differs from the data presented in Fishers article (identified by Steve Chadwick, spchadwick ‘@’ espeedaz.net ). The 35th sample should be: 4.9,3.1,1.5,0.2,“Iris-setosa” where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,“Iris-setosa” where the errors are in the second and third features.

Description of the attributes:

Life expectancy

This dataset contains 6 columns and 223 Rows. Each row corresponds to a country in order of their life expectancy rank. The dataset has three numeric columns, Overall Life Expectancy, Male Life Expectancy and Female Life Expectancy. The last column is Continent, which defines which continent that country lies in. This could be very well used as a class for the data.

This data can be used for classification by various techniques like SVM(linear), KNN, C.45 etc. and other supervised and unsupervised techniques.

  • Source: Kaggle
  • Classification: multiclass
  • Input features: numerical
  • Number of rows: 223
  • Number of attributes: 3

Description of the attributes:

Seeds

The examined group comprised kernels belonging to three different varieties of wheat: Kama, Rosa and Canadian, 70 elements each, randomly selected for the experiment. High quality visualization of the internal kernel structure was detected using a soft X-ray technique. It is non-destructive and considerably cheaper than other more sophisticated imaging techniques like scanning microscopy or laser technology. The images were recorded on 13x18 cm X-ray KODAK plates. Studies were conducted using combine harvested wheat grain originating from experimental fields, explored at the Institute of Agrophysics of the Polish Academy of Sciences in Lublin.

The data set can be used for the tasks of classification and cluster analysis.

Description of the attributes:

Vertebral column 3

ref a vertebral column 2 aunque sí que escribir descripción aquí

Wifi localization

Collected to perform experimentation on how wifi signal strengths can be used to determine one of the indoor locations.

Description of the attributes:

Yeast

Desription of the dadtaset

El original tiene un atributo mas Sequence Name: Accession number for the SWISS-PROT database

Description of the attributes:

  • mcg: McGeoch’s method for signal sequence recognition. numerical
  • gvh: von Heijne’s method for signal sequence recognition. numerical
  • alm: Score of the ALOM membrane spanning region prediction program. numerical
  • mit: Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins. numerical
  • erl: Presence of “HDEL” substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute. numerical
  • pox: Peroxisomal targeting signal in the C-terminus. numerical
  • vac: Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins. numerical
  • nuc: Score of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins. numerical
  • class:
    • CYT
    • ERL
    • EXC
    • ME1
    • ME2
    • ME3
    • MIT
    • NUC
    • POX
    • VAC

>=10

Mixed

Binary

<10

acute_inflammations1

acute_inflammations2

caesarian

mini_mammographic_masses

>= 10

statlog

primary_tumor

Multiclass

< 10

abalone

teaching_assistant

>= 10

contraceptive

Categorical

Binary

balance_scale

breast_cancer

mini_cars

somerville

mini_tic_tac_toe

Multiclass

post_operative

mini_connect4

soybean_large

zoo

Keel

Imbalanced

Imbalanced data sets are a special case of classification problem where the class distribution is not uniform among the classes. Typically, they are composed by two classes: The majority (negative) class and the minority (positive) class.

Binary

Binary: Imbalance ratio between 1.5 and 9

[1] "ecoli_0_vs_1"
[1] "iris0"
[1] "glass0"
[1] "glass1"
[1] "glass6"
[1] "haberman"
[1] "iris0"
[1] "wisconsin"
# A tibble: 8 x 9
  name  type  instances features num_cat classes class_names proportion
  <chr> <fct>     <int>    <dbl> <chr>     <int> <chr>       <chr>     
1 ecol… nume…       220        7 [7/0]         2 [negative/… [0.35/0.6…
2 iris0 nume…       150        4 [4/0]         2 [positive/… [0.33/0.6…
3 glas… nume…       214        9 [9/0]         2 [positive/… [0.33/0.6…
4 glas… nume…       214        9 [9/0]         2 [positive/… [0.36/0.6…
5 glas… nume…       214        9 [9/0]         2 [positive/… [0.14/0.8…
6 habe… nume…       306        3 [3/0]         2 [2/1]       [0.26/0.7…
7 iris0 nume…       150        4 [4/0]         2 [positive/… [0.33/0.6…
8 wisc… nume…       683        9 [9/0]         2 [positive/… [0.35/0.6…
# … with 1 more variable: imbalance_ratio <dbl>
ecoli_0_vs_1
     class  Mcg  Gvh  Lip Chg  Aac Alm1 Alm2
1 positive 0.49 0.29 0.48 0.5 0.56 0.24 0.35
2 positive 0.07 0.40 0.48 0.5 0.54 0.35 0.44
3 positive 0.56 0.40 0.48 0.5 0.49 0.37 0.46
4 positive 0.59 0.49 0.48 0.5 0.52 0.45 0.36
5 positive 0.23 0.32 0.48 0.5 0.55 0.25 0.35
6 positive 0.67 0.39 0.48 0.5 0.36 0.38 0.46
7 positive 0.29 0.28 0.48 0.5 0.44 0.23 0.34
8 positive 0.21 0.34 0.48 0.5 0.51 0.28 0.39
9 positive 0.20 0.44 0.48 0.5 0.46 0.51 0.57
 [ reached 'max' / getOption("max.print") -- omitted 211 rows ]

glass0
     class       RI       Na      Mg      Al      Si       K       Ca Ba
1 positive 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931  8.04468  0
2 positive 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205  8.52888  0
3 positive 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178  9.57260  0
4 positive 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520  0
5 positive 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132  8.27064  0
6 positive 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694  9.40044  0
7 positive 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132  8.23836  0
      Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
 [ reached 'max' / getOption("max.print") -- omitted 207 rows ]

glass1
     class       RI       Na      Mg      Al      Si       K       Ca Ba
1 negative 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931  8.04468  0
2 negative 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205  8.52888  0
3 negative 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178  9.57260  0
4 negative 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520  0
5 negative 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132  8.27064  0
6 negative 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694  9.40044  0
7 negative 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132  8.23836  0
      Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
 [ reached 'max' / getOption("max.print") -- omitted 207 rows ]

glass6
     class       RI       Na      Mg      Al      Si       K       Ca Ba
1 negative 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931  8.04468  0
2 negative 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205  8.52888  0
3 negative 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178  9.57260  0
4 negative 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520  0
5 negative 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132  8.27064  0
6 negative 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694  9.40044  0
7 negative 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132  8.23836  0
      Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
 [ reached 'max' / getOption("max.print") -- omitted 207 rows ]

haberman
# A tibble: 306 x 4
   class   age  year nodes
   <fct> <dbl> <dbl> <dbl>
 1 1        30    64     1
 2 1        30    62     3
 3 1        30    65     0
 4 1        31    59     2
 5 1        31    65     4
 6 1        33    58    10
 7 1        33    60     0
 8 2        34    59     0
 9 2        34    66     9
10 1        34    58    30
# … with 296 more rows

iris0
      class SepalLength SepalWidth PetalLength PetalWidth
1  positive         5.1        3.5         1.4        0.2
2  positive         4.9        3.0         1.4        0.2
3  positive         4.6        3.1         1.5        0.2
4  positive         5.0        3.6         1.4        0.2
5  positive         5.4        3.9         1.7        0.4
6  positive         4.6        3.4         1.4        0.3
7  positive         5.0        3.4         1.5        0.2
8  positive         4.4        2.9         1.4        0.2
9  positive         5.4        3.7         1.5        0.2
10 positive         4.8        3.4         1.6        0.2
11 positive         4.8        3.0         1.4        0.1
12 positive         4.3        3.0         1.1        0.1
13 positive         5.7        4.4         1.5        0.4
14 positive         5.4        3.9         1.3        0.4
15 positive         5.1        3.5         1.4        0.3
 [ reached 'max' / getOption("max.print") -- omitted 135 rows ]

winsconsin
     class ClumpThickness CellSize CellShape MarginalAdhesion
1 negative              6        1         1                1
2 negative              6        5         5                6
3 negative              4        1         1                1
4 negative              7        9         9                1
5 negative              5        1         1                4
6 positive              9        2         2                9
7 negative              1        1         1                1
  EpithelialSize BareNuclei BlandChromatin NormalNucleoli Mitoses
1              3          1              4              1       1
2              8          2              4              3       1
3              3          3              4              1       1
4              4          5              4              8       1
5              3          1              4              1       1
6              8          2             10              8       1
7              3          2              4              1       1
 [ reached 'max' / getOption("max.print") -- omitted 676 rows ]

Binary: Imbalance ratio higher than 9

[1] "ecoli_0_vs_1"
[1] "iris0"
[1] "glass0"
[1] "glass1"
[1] "glass6"
[1] "haberman"
[1] "iris0"
[1] "wisconsin"
# A tibble: 8 x 9
  name  type  instances features num_cat classes class_names proportion
  <chr> <fct>     <int>    <dbl> <chr>     <int> <chr>       <chr>     
1 ecol… nume…       220        7 [7/0]         2 [negative/… [0.35/0.6…
2 iris0 nume…       150        4 [4/0]         2 [positive/… [0.33/0.6…
3 glas… nume…       214        9 [9/0]         2 [positive/… [0.33/0.6…
4 glas… nume…       214        9 [9/0]         2 [positive/… [0.36/0.6…
5 glas… nume…       214        9 [9/0]         2 [positive/… [0.14/0.8…
6 habe… nume…       306        3 [3/0]         2 [2/1]       [0.26/0.7…
7 iris0 nume…       150        4 [4/0]         2 [positive/… [0.33/0.6…
8 wisc… nume…       683        9 [9/0]         2 [positive/… [0.35/0.6…
# … with 1 more variable: imbalance_ratio <dbl>
ecoli4
     class  Mcg  Gvh  Lip Chg  Aac Alm1 Alm2
1 negative 0.49 0.29 0.48 0.5 0.56 0.24 0.35
2 negative 0.07 0.40 0.48 0.5 0.54 0.35 0.44
3 negative 0.56 0.40 0.48 0.5 0.49 0.37 0.46
4 negative 0.59 0.49 0.48 0.5 0.52 0.45 0.36
5 negative 0.23 0.32 0.48 0.5 0.55 0.25 0.35
6 negative 0.67 0.39 0.48 0.5 0.36 0.38 0.46
7 negative 0.29 0.28 0.48 0.5 0.44 0.23 0.34
8 negative 0.21 0.34 0.48 0.5 0.51 0.28 0.39
9 negative 0.20 0.44 0.48 0.5 0.46 0.51 0.57
 [ reached 'max' / getOption("max.print") -- omitted 327 rows ]

ecoli_0_1_4_6_vs_5
      class a1 a2 a3 a5 a6 a7
1  negative 49 29 48 56 24 35
2  negative  7  4 48 54 35 44
3  negative 56  4 48 49 37 46
4  negative 59 49 48 52 45 36
5  negative 23 32 48 55 25 35
6  negative 67 39 48 36 38 46
7  negative 29 28 48 44 23 34
8  negative 21 34 48 51 28 39
9  negative  2 44 48 46 51 57
10 negative 42  4 48 56 18  3
 [ reached 'max' / getOption("max.print") -- omitted 270 rows ]

ecoli_0_1_4_7_vs_2_3_5_6
     class a1 a2 a3 a4 a5 a6 a7
1 negative 49 29 48  5 56 24 35
2 negative  7  4 48  5 54 35 44
3 negative 56  4 48  5 49 37 46
4 negative 59 49 48  5 52 45 36
5 negative 23 32 48  5 55 25 35
6 negative 67 39 48  5 36 38 46
7 negative 29 28 48  5 44 23 34
8 negative 21 34 48  5 51 28 39
9 negative  2 44 48  5 46 51 57
 [ reached 'max' / getOption("max.print") -- omitted 327 rows ]

ecoli_0_1_4_7_vs_5_6
      class a1 a2 a3 a5 a6 a7
1  negative 49 29 48 56 24 35
2  negative  7  4 48 54 35 44
3  negative 56  4 48 49 37 46
4  negative 59 49 48 52 45 36
5  negative 23 32 48 55 25 35
6  negative 67 39 48 36 38 46
7  negative 29 28 48 44 23 34
8  negative 21 34 48 51 28 39
9  negative  2 44 48 46 51 57
10 negative 42  4 48 56 18  3
 [ reached 'max' / getOption("max.print") -- omitted 322 rows ]

ecoli_0_6_7_vs_5
      class a1 a2 a3 a5 a6 a7
1  negative 49 29 48 56 24 35
2  negative  7  4 48 54 35 44
3  negative 56  4 48 49 37 46
4  negative 59 49 48 52 45 36
5  negative 23 32 48 55 25 35
6  negative 67 39 48 36 38 46
7  negative 29 28 48 44 23 34
8  negative 21 34 48 51 28 39
9  negative  2 44 48 46 51 57
10 negative 42  4 48 56 18  3
 [ reached 'max' / getOption("max.print") -- omitted 210 rows ]

glass2
     class      RI    Na   Mg   Al    Si    K   Ca   Ba   Fe
1 negative 1.51673 13.30 3.64 1.53 72.53 0.65 8.03 0.00 0.29
2 negative 1.51750 12.82 3.55 1.49 72.75 0.54 8.52 0.00 0.19
3 negative 1.51775 12.85 3.48 1.23 72.97 0.61 8.56 0.09 0.22
4 negative 1.51646 13.41 3.55 1.25 72.81 0.68 8.10 0.00 0.00
5 negative 1.51761 12.81 3.54 1.23 73.24 0.58 8.39 0.00 0.00
6 negative 1.51846 13.41 3.89 1.33 72.38 0.51 8.28 0.00 0.00
7 negative 1.51811 13.33 3.85 1.25 72.78 0.52 8.12 0.00 0.00
 [ reached 'max' / getOption("max.print") -- omitted 207 rows ]

glass4
     class       RI       Na      Mg      Al      Si       K       Ca Ba
1 negative 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931  8.04468  0
2 negative 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205  8.52888  0
3 negative 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178  9.57260  0
4 negative 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520  0
5 negative 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132  8.27064  0
6 negative 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694  9.40044  0
7 negative 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132  8.23836  0
      Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
 [ reached 'max' / getOption("max.print") -- omitted 207 rows ]

glass5
     class       RI       Na      Mg      Al      Si       K       Ca Ba
1 negative 1.515888 12.87795 3.43036 1.40066 73.2820 0.68931  8.04468  0
2 negative 1.517642 12.97770 3.53812 1.21127 73.0020 0.65205  8.52888  0
3 negative 1.522130 14.20795 3.82099 0.46976 71.7700 0.11178  9.57260  0
4 negative 1.522221 13.21045 3.77160 0.79076 71.9884 0.13041 10.24520  0
5 negative 1.517551 13.39000 3.65935 1.18880 72.7892 0.57132  8.27064  0
6 negative 1.520991 13.68925 3.59200 1.12139 71.9604 0.08694  9.40044  0
7 negative 1.517551 13.15060 3.60996 1.05077 73.2372 0.57132  8.23836  0
      Fe
1 0.1224
2 0.0000
3 0.0000
4 0.0000
5 0.0561
6 0.0000
7 0.0000
 [ reached 'max' / getOption("max.print") -- omitted 207 rows ]

Multiclass

Noisy

Class noise

Attribute noise

Desription of the dadtaset

Description of the attributes:

Noelia Rico

2019-07-21